Bayes' theorem

Bayesian statistics
Theory
Bayesian probability Probability interpretations Bayes' theorem Bayes' rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood Conjugate prior Hyperparameter · Hyperprior Principle of indifference Principle of maximum entropy Empirical Bayes method Cromwell's rule Bernstein–von Mises theorem Bayesian information criterion Credible interval Maximum a posteriori estimation
Techniques
Bayesian linear regression Bayesian estimator Approximate Bayesian computation

In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) is a method of incorporating new knowledge to update the value of the probability of occurence of an event. To that end the theorem gives the relationship between the updated probability $P(A|B)$ , the conditional probability of $A$ given the new knowledge $B$ , and the probabilities of $A$ and $B$ , $P(A)$ and $P(B)$ , and the conditional probability of $B$ given $A$ , $P(B|A)$ . The theorem is named for Thomas Bayes (pronounced /ˈbeɪz/ or "bays").^[1] In its most common form, Bayes' theorem is:

$P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}. \,$

Introductory example

If someone told you he had a nice conversation in the train, the probability it was a woman he spoke with is 50%. If he told you the person he spoke to was going to visit a quilt exhibition, it is far more likely than 50% it is a woman. Call $W$ the event he spoke to a woman, and $Q$ the event "a visitor of the quilt exhibition". Then: $P(W)=0.50$ , but with the knowledge of $Q$ the updated value is $P(W|Q)$ that may be calculated with Bayes' formula as:

$P(W|Q)=\frac{P(Q|W)P(W)}{P(Q)}=\frac{P(Q|W)P(W)}{P(Q|W)P(W)%2BP(Q|M)P(M)},$

in which $M$ (man) is the complement of $W$ . As $P(M)=P(W)=0.5$ and $P(Q|W)>>P(Q|M)$ , the updated value will be quite close to 1.

Interpretation

Bayes' theorem has two distinct interpretations. In the frequentist interpretation it relates two representations of the probabilities assigned to a set of outcomes (conceptual inverses of each other). Both can be meaningful, so if only one is known Bayes' theorem enables conversion. In the Bayesian interpretation, Bayes' theorem is an expression of how degrees of belief should rationally be updated to account for evidence. The application of this view is called Bayesian inference, and is widely applied in fields including science, engineering, medicine and law.^[2] The meaning of Bayes' theorem depends on the interpretation of probability ascribed to the terms:

Frequentist interpretation

In the frequentist interpretation, probability measures the proportion of trials in which an event (a subset of the possible outcomes) occurs. Consider events $A$ and $B$ . Suppose we consider only trials in which $A$ occurs. The proportion in which $B$ also occur is $P(B|A)$ . Conversely, suppose we consider only trials in which $B$ occurs. The proportion in which $A$ also occur is $P(A|B)$ . Bayes' theorem links these two quantities, with $P(A)$ and $P(B)$ the overall proportions of trials with $A$ and $B$ .

The situation may be more fully visualised with tree diagrams, shown to the right. For example, suppose that some members of a population have a risk factor for a medical condition, and some have the condition. The proportion with the condition depends whether those with or without the risk factor are examined. The proportion having the risk factor depends whether those with or without the condition are examined. Bayes' theorem links these two representations.

Bayesian interpretation

Main article: Bayesian probability

In the Bayesian (or epistemological) interpretation, probability measures a degree of belief. Bayes' theorem then links the degree of belief in a proposition before and after accounting for evidence. For example, suppose somebody proposes that a biased coin is twice as likely to land heads than tails. Degree of belief in this might initially be 50%. The coin is then flipped a number of times to collect evidence. Belief may rise to 70% if the evidence supports the proposition.

For proposition $A$ and evidence $B$ ,

$P(A)$ , the prior, is the initial degree of belief in $A$ .
$P(A|B)$ , the posterior, is the degree of belief having accounted for $B$ .
$P(B|A)/P(B)$ represents the support $B$ provides for $A$ .

For more on the application of Bayes' theorem under the Bayesian interpretation of probability, see Bayesian inference.

Forms

For events

Simple form

For events $A$ and $B$ , provided that $P(B) \ne 0$ .

$P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}. \,$

In a Bayesian inference step, the probability of evidence $B$ is constant for all models $A_n$ . The posterior may then be expressed as proportional to the numerator:

$P(A_n|B) \propto P(B|A_n) P(A_n). \,$

Extended form

Often, for some partition of the event space $\{A_i\}$ , the event space is given or conceptualized in terms of $P(A_i)$ and $P(B|A_i)$ . It is then useful to eliminate $P(B)$ using the law of total probability:

$P(B) = {\sum_j P(B|A_j) P(A_j)}.$

$\implies P(A_i|B) = \frac{P(B|A_i)\,P(A_i)}{\sum\limits_j P(B|A_j)\,P(A_j)}.$

In the special case of a binary partition,

$P(A|B) = \frac{P(B|A)\,P(A)}{ P(B|A) P(A) %2B P(B|\neg A) P(\neg A)}.$

Three or more events

Extensions to Bayes' theorem may be found for three or more events. For example, for three events, two possible tree diagrams branch in the order BCA and ABC. By repeatedly applying the definition of conditional probability:

$P(A|B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)} = \frac{P(C|A \cap B) \, P(A \cap B)}{P(C|B) \, P(B)} = \frac{P(C|A \cap B) \, P(B|A) \, P(A)}{P(C|B) \, P(B)}.$

As previously, the law of total probability may be substituted for unknown marginal probabilities.

For random variables

Consider a sample space $\Omega$ generated by two random variables $X$ and $Y$ . In principle, Bayes' theorem applies to the events $A = \{X=x\}$ and $B = \{Y=y\}$ . However, terms become 0 at points where either variable has finite probability density. To remain useful, Bayes' theorem may be formulated in terms of the relevant densities (see Derivation).